Comparison of Two Contextual Post - Processing Algorithms for Text Recog

نویسندگان

  • Jonathan J. Hull
  • Sargur N. Srihari
چکیده

The binary n-gra~ and Viterbi algorithms are alternative approaches to con-te.xtual post-processing for text produced by a noisy channel such as an optical character recognizer. The paper describes the underlying theory of each approach in unified terminology, presents a storage efficient data structure for the binary n-gram algorithm and a recursive formulation for the viterbi algorithm. Relative merits of the two methods based on extensive experiments with each algorithm are given. 1. INTRODUCTION A number of programs for utilizing some form of contextual knowledge for post-processing of garbled text produced by a noisy channel such as an optical Character Recognition [OCRJ machine are described in the literature. Among these one can discern two different types of representation of context: statistical and structural. This paper describes our experiences with two contextual post-processing algorithms which are based on each of the two different types of representation. Statistical representation of context consists of models of word generation processes , e.g., Markovian model of text source, whose origins are in information theory. A particular algorithm of this class that we consider is the Viterbi al-od thm [3J and its modifications 11,6] • The structural approach, which is closer to the artificial intelligence framework, is based on a deterministic representation of context; examples are implicit or explicit representations of a lexicon, syntax or semantics. The method of this class we consider is the binary n-gram algorithm (2,5] which represents the syntax of a lexicon in the form of a set of binary arrays. 2. BINARY n-GRAM ALGORITHM

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Contextual Clue Selection on Inference

Linguistic information can be conveyed in the form of speech and written text, but it is the content of the message that is ultimately essential for higher-level processes in language comprehension, such as making inferences and associations between text information and knowledge about the world. Linguistically, inference is the shovel that allows receivers to dig meaning out from the text with...

متن کامل

Analysis of Pre-processing and Post-processing Methods and Using Data Mining to Diagnose Heart Diseases

Today, a great deal of data is generated in the medical field. Acquiring useful knowledge from this raw data requires data processing and detection of meaningful patterns and this objective can be achieved through data mining. Using data mining to diagnose and prognose heart diseases has become one of the areas of interest for researchers in recent years. In this study, the literature on the ap...

متن کامل

Image-based keyword recognition in oriental language document images

-An algorithm is presented for keyword recognition in Oriental language document images. The objective is to recognize keywords composed of more than one consecutive character in document images where there are no explicit visually defined word boundaries. The technique exploits the redundancy expressed by the difference between the number of possible character strings of a fixed length and the...

متن کامل

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

A comprehensive benchmark between two filter-based multiple-point simulation algorithms

Computer graphics offer various gadgets to enhance the reconstruction of high-order statistics that are not correctly addressed by the two-point statistics approaches. Almost all the newly developed multiple-point geostatistics (MPS) algorithms, to some extent, adapt these techniques to increase the simulation accuracy and efficiency. In this work, a scrutiny comparison between our recently dev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1982